DOCUMENT RESUME 



ED 039 149 



24 



SE 008 673 



AUTHOR 



INSTITUTION 
SPONS AGENCY 

BUREAU NO 
PUP DA^E 
GRANT 
NOTE 



Mayer, William V.; And others 

A Formative Evaluation of Biological Science: 
Patterns and Processes, Final Report. 

Biological Sciences Curriculum Study, Boulder, Colo. 
Office of Education (DHEW) , Washington, D.C. Bureau 
of Research. 

BR-9-H-012 
Mar 7 0 

OFG-8- Q -150012“2018 (05 8) 

26 2p. 



EDPS PRICE 
DESCRIPTORS 



IDENTIFIERS 



FDPS Price MF-$1.00 HC-$13.20 

Academic Achievement, ^Achievement ^ests, ^Biology, 
Curriculum Development, ^Curriculum Evaluation, 
^Evaluation, Pesearch Methodology, ^Secondary School 
Science, Student Characteristics 
Biological Sciences Curriculum Study 



ABSTRACT 



Feported is a formative evaluation of the Biological 
Science Curriculum Study "Biological Science: Patterns and 
Processes", designed for academically unsuccessful students. 
"Criterion referenced" tests were developed, with items selected to 
indicate the extent of students’ learning rather than to discriminate 
between students. An alternate form, pretest-posttest research design 
was used. Randomly selected students within classes of teachers who 
had participated in feedback and training activities were given 
alternate test forms for each of five content areas. Scores on these 
tests served as the dependent variables with scores on Verbal 
Reasoning and Numerical Ability subtests of the Differential Aptitude 
Test, and Davis Reading Test scores serving as independent variables. 
Data were also collected on school and community characteristics. 
Analysis of covariance and multiple regression analysis showed 
significant differences between classes (tentatively attributed to 
teacher performance), and significant correlations between reading 
comprehension and achievement. Recommendations are made for revision 
of the materials and for similar evaluative studies. Appended are 
tables of results and statistical analyses, and copies of tests used. 
(EB) 



U.s. DEPARTMENT OF HEALTH, EDUCATION & WELFARE 
OFFICE OF EDUCATION 



// H 



Q% 



THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT, POINTS OF VIEW OR OPINIONS 
STATED DC NOT NECESSARILY REPRESENT OFFICIAL OFFICE Of EDUCATION 
POSITION OR POLICY, 



o> 

K\ 



PINAL .REPORT 



Project No. 9-H-012 



Q 



LU 



Grant No. OEG-8-9-150012-2018 (058) 



A FORMATIVE EVALUATION OP 



BIOIiOGICAL SCIENCE : PATTERNS AND PROCESSES 



William V. Mayer# Project Director 
Richard C. Anderson, Consultant 
Thomas J. Cleaver, staff Consultant 
James T. Robinson, Staff Consultant 
Richard R. Tolman, Staff Consultant 
Biological Sciences Curriculum Study Center 
University of Colorado 
P.O. Box 930 
Boulder, Colorado 80302 



March, 1970 



U.S. DEPARTMENT OP 
HEALTH, EDUCATION, AND WELFARE 

Office of Education 
Bureau of Research 




No 



OO 

© 

© 



Ml 




ERiC 



FINAL .REPORT 



Project No. 9-H-012 
Grant No. OEG-8-9-150012-2018 (058) 



A FORMATIVE EVALUATION OF 
BIOLOGICAL SCIENCE: PATTERNS AND PROCESSES 



William V. Mayer, Project Director 
Richard C. Anderson, Consultant 
Thomas J. Cleaver, Staff Consultant 
James T. Robinson, Staff Consultant 
Richard R. Tolman, Staff Consultant 

Biological Sciences Curriculum Study Center 
University of Colorado 

Boulder, Colorado 80302 

March , 1970 



The research reported herein was performed pursuant to a grant with 
the Office of Education, U.S. Department of Health, Education, and 
Welfare. Contractors undertaking such projects under Government 
sponsorship are encouraged to express freely their professional 
judgment in the conduct of the project. Points of view or opinions 
stated do not, therefore, necessarily represent official Office of 
Education position or policy 



U.S. DEPARTMENT OF 
HEALTH, EDUCATION, AND WELFARE 

Office of Education 
Bureau of Research 



BSCS S/M EVALUATION 



O 

ERIC 



TABLE OP CONTENTS 



CHAPTER PAGE 

I. BACKGROUND OF THE STUDY 1 

II. RESEARCH DESIGN AND ANALYSIS 5 

Research Design 5 

Analysis ..... 6 

III. THE TEST POPULATION 9 

IV. RESULTS AND DISCUSSION - UNIT. I , ECOLOGY 13 

Results - 15 

Discussion 22 

V. RESULTS AND DISCUSSION - UNIT II , CELL 

ENERGY PROCESSES 23 

Results 23 

Discussion 35 

VI. RESULTS AND DISCUSSION - UNIT III, 

REPRODUCTION AND DEVELOPMENT . 36 

Results * 41 

Discussion 47 

VII. RESULTS AND DISCUSSION - UNIT IV, 

GENETIC CONTINUITY 48 

Results * 53 

Discussion ..... 59 

VIII. IMPACT OP EVALUATION ON PATTERNS AND PROCESSES 60 

IX. CONCLUSIONS AND RECOMMENDATIONS 62 

Conclusions 62 

Recommendations 64 



CHAPTER 



PAGE 



REFERENCES 
APPENDIX A. 
APPENDIX B. 
APPENDIX C. 
APPENDIX D. 
APPENDIX E. 
APPENDIX F. 
APPENDIX G. 
APPENDIX H. 
APPENDIX I. 
APPENDIX J. 



t-Test Results t 

Item Analysis Results 

Percent Possible Gain Tables 

Factor Analysis Results 

Multiple Linear Regression Analysis 

Residual Gain Results 

Summary Tables 

Teacher Addresses 

Teacher Questionnaire . B . . 

Test Instruments 



66 

68 

73 

88 

94 

103 

125 

145 

164 

167 

169 



iii 




LIST 0 7 TABLES 



TABLE PAGE 

1. Number of Students by Teachers, Grade Level, and 

Community Characteristics 10 

2. Number and Percentage of Students, and Number of Teachers 

Classes by Community Characteristics 11 

3. Raw Scores on the Differential Aptitude Test (DAT), 

Form A and the Davis Reading Test (DRT) , Form 2C 

for Students with Complete Scores . 12 

4. Logical Design for Items, Form A and Form B, to 

Assess Student Understandings of Ecological Concepts, 

Unit I 13 

5. Raw Scores on the Verbal Reasoning and Numerical Ability 

Tests of the Differential Aptitude Test and of the 
Comprehension and Speed Tests of the Davis Reading Test 
for Students with Both Pretest and Posttests on Unit I, 

Ecology . 15 

6. N, Means, Standard Deviations and Adjusted Means of 

37 Classes on Pretest A - Posttest B, Analysis of 

Covariance . 1? 

7. ANCOVA Table for Differences Between Adjusted Posttest 

Means on Posttest B, Unit I 18 

8. N, Means, Standard Deviations and Adjusted Means of 

37 Classes on Pretest B - Posttest A, Analysis of 

Covariance 19 



TABLE 



PAGE 



9. ANCOVA Table for Differences Between Adjusted Posttest 

Means on Posttest A, Unit I 20 

10. N , Means, Standard Deviations, and Adjusted Means of 

3 Blocks of Students Based on DRT Comprehension 

Percentile Rankings, Unit I 21 

11. ANCOVA Table for Differences Between Adjusted Posttest 

Means Blocked on DRT Comprehension Percentile 

Rankings, Unit I 21 

12. Logical Design for Items, Form A and Form B, to Assess 

Student understandings of Concepts of Cell Energy 

Processes, Unit II 23 

13. Raw Scores on the Verbal Reasoning and Numerical Ability 

Tests of the Differential Aptitude Test and of the 
Comprehension_a.nd Speed Tests of the Davis Reading Test 
for Students with Both Pretest and Posttests on Unit II, 

Cell Energy Processes * 28 

14. N, Meanfe, Standard Deviations, and Adjusted Means of 

31 Classes on Pretest A - Posttest B, Analysis of 

Covariance 30 

15. ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest B, Unit II 31 

16. N, Means, Standard Deviations, and Adjusted Means of 

31 Classes on Pretest B - Posttest A, Analysis of 

Covariance 33 



v 



TABLE I AGE 

17 . ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest A, Unit II 33 

18. N, Means, Standard Deviations, and Adjusted Means of 

3 Blocks of Students Based on DRT Comprehension 

Percentile Rankings, Unit II 34 

19. ANCOVA Table for Differences Between Adjusted Posttest 

Means Blocked on DRT Comprehension Percentile 

Rankings, Unit II 34 

20. Logical Design for Items, Form A and Form B, to Assess 

Student Understandings of Reproduction and Development 
Concepts, Unit III 36 

21. Raw Scores on the Verbal Reasoning and Numerical Ability 

Tests of the Differential Aptitude Test and of the 
Comprehension and Speed Tests of the Davis Reading Test 
for Students with Both Pretest and Posttests on Unit 
III, Reproduction and Development 41 

22. N, Means, Standard Deviations, and Adjusted Means of 

29 Classes on Pretest A - Posttest B, Analysis of 

Covariance 43 

23. ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest B, Unit III 44 

24. N, Means, Standard Deviations, and Adjusted Means of 

29 Classes on Pretest B - Posttest A, Analysis of 

Covariance * 45 



TABLE 



PAGE 



25. ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest A, Unit III 46 

26. N, Means, Standard Deviations, and Adjusted Means of 

3 Blocks of Students Based on DRT Comprehension 

Percentile Rankings, Unit III . 47 

27. ANCOVA Table for Differences Between Adjusted Posttest 

Means Blocked on DRT Comprehension Percentile 

Rankings, Unit III 47 

28. Logical Design for Items, Form A and Form B, to Assess 

Student Understandings of Genetic Continuity Concepts, 

Unit IV 48 

29. Raw Scores on the Verbal Reasoning and Numerical Ability 

Tests of the Differential Aptitude Test and of the 
Comprehension and Speed Tests of the Davis Reading Test 
for Students with Both Pretest and Posttests on Unit IV, 

Genetic Continuity ..... 53 

30. N, Means, Standard Deviations, and Adjusted Means of 

25 Classes on Pretest A - Posttest B, Analysis of 

Covariance 55 

31. ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest B, Unit IV 56 

32. N, Means, Standard Deviations, and Adjusted Means of 

* 

25 Classes on Pretest B - Posttest A, Analysis of 

Covariance 57 



TABLE 



33. ANCOVA Table for Differences Between Classes on the 

Adjusted Posttest Means on Posttest A, Unit IV . . 

34. N, Means, Standard Deviations, and Adjusted Means of 

3 Blocks of Students Based on DRT Comprehension 
Percentile Rankings, Unit IV 

35. ANCOVA Table for Differences Between Adjusted Posttest 

Means Blocked on DRT Comprehension Percentile 
Rankings, Unit IV 



viii 



LIST OF APPENDIX TABLES 



TABLE PAGE 

1. Results of t-Tests for Differences Between Independent 

Variables , Unit I ® 69 

2. Results of t-Tests for Differences Between Independent 

Variables , Unit II * 70 

3. Results of t-Tests for Differences Between Independent 

Variables, Unit III 71 

4. Results of t-Tests for Differences Between Independent 

Variables, Unit IV 72 

5. Item Analysis Results - Unit I, Posttest A 74 

6. Item Analysis Results - Unit I, Posttest B 75 

7. Item Analysis Results - Unit II, Posttest A 76 

8. Item Analysis Results - Unit II, Posttest B 78 

9. Item Analysis Results - Unit III, Posttest A 80 

10. Item Analysis Results - Unit III, Posttest B 82 

11. Item Analysis Results - Unit IV, Posttest A 84 

12. Item Analysis Results - Unit IV, Posttest B 86 

13. Percent Possible Gain Calculations, Unit I 90 

14. Pe rcent Possible Gain Calculations, Unit II 91 

15. Percent Possible Gain Calculations, Unit III 92 

16. Percent Possible Gain Calculations, Unit IV 93 

17. Factor Analysis Results - Unit I, Posttest A 95 

18. Factor Analysis Results - Unit I, Posttest B 96 

19. Factor Analysis Results - Unit II, Posttest A 97 

20. Factor Analysis Results - Unit II, Posttest B 98 



TABLE 



PAGE 



21. Factor Analysis Results - Unit III, Posttest A 99 

22. Factor Analysis Results - Unit III, Posttest B 100 

23. Factor Analysis Results - Unit IV, Posttest A 101 

24. Factor Analysis Results - Unit IV, Posttest B 102 

25. Results of Multiple Linear Regression Analysis for 

Posttest B, Unit I 105 

26. Results of Multiple Linear Regression Analysis for 

Posttest A, Unit I 105 

27. Summary Table - Multiple Stepwise Regression Analysis - 

Unit I , Pretest A - Posttest B 106 

28. Summary Table - Multiple Stepwise Regression Analysis - 

Unit I, Pretest B - Posttest A 108 

29. Results of Multiple Stepwise Regression Analysis - 



Correlation Between Independent and Dependent Variables - 

Unit I, Pretest A - Posttest B 108 

30. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent Variables - 



Unit I, Pretest B - Posttest A 109 

31. Results of Multiple Linear Regression Analysis for 

Posttest B, Unit II 110 

32. Results of Multiple Linear Regression Analysis for 

Posttest A, Unit II Ill 

33. Summary Table - Multiple Stepwise Regression Analysis - 

Unit II, Pretest A - Posttest B 112 



ERIC 



X 



TABLE 



PAGE 



34. Summary Table - Multiple Stepwise Regression Analysis - 

Unit II, Pretest B - Posttest A 113 

35. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 

Variables - Unit II, Pretest A - Posttest B 113 

36. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 

Variables - Unit II, Pretest B - Posttest A 114 

37. Results of Multiple Linear Regression Analysis for 

Posttest B, Unit III * 115 

38. Results of Multiple Linear Regression Analysis for 

Posttest A, Unit III 115 

39. Summary Table - Multiple Stepwise Regression Analysis - 

Unit III, Pretest A - Posttest B 117 

40. Summary Table - Multiple Stepwise Regression Analysis - 

Unit III, Pretest B - Posttest A - 118 

41. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 

Variables - Unit III, Pretest A - Posttest B 118 

42. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 

Variables - Unit III, Pretest B - Posttest A 119 

43. Results of Multiple Linear Regression Analysis for 



Posttest B, Unit IV 



120 



table 



PAGE 



44. Results of Multiple Linear Regression Analysis for 

Posttest A, Unit IV 121 

45. Summary Table - Multiple Stepwise Regression Analysis 

Unit IV, Pretest A - Posttest B 122 

46. Summary Table - Multiple Stepwise Regression Analysis - 

Unit IV, Pretest B - Posttest A 123 



47. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 

Variables — Unit IV, Pretest A - Posttest B 124 

48. Results of Multiple Stepwise Regression Analysis - 

Correlation Between Independent and Dependent 



Variables - Unit IV, Pretest B - Post test A 124 

49. Residual Gain - Class Data, Unit I, 

Pretest A - Posttest B 127 

50. Residual Gain - Class Data, Unit I, 

Pretest B - Posttest A . . . - 130 

51. Residual Gain - Class Data, Unit II , 

Pretest A - Posttest B 133 

52. Residual Gain - Class Data, Unit II, 

Pretest B - Posttest A * 135 

53. Residual Gain - Class Data, Unit III, 

Pretest A ~ Posttest B 137 

54. Residual Gain - Class Data, Unit III, 

Pretest B - Posttest A 139 




xii 



TABLE 



PAGE 



55. Residual Gain - Class Data, Unit IV 

Pretest A - Posttest B 141 

56. Residual Gain - Class Data, Unit IV, 

Pretest B - Posttest A 143 

57. Summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit I, Pretest A - Posttest B . . 146 

58. Summary of Mean Scores by Class for the Dependent and 

Independent Variables , Unit I , Pretest B - Posttest A . . 149 

59. Summary of Mean Scores by Class tor the Dependent and 

Independent Variables, Unit II, Pretest A - 

Posttest B 159 

60. Summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit II, Pretest B - 

Posttest A * 154 

61. summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit III, Pretest A - 

Posttest B . „ 156 

62. Summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit III, Pretest B - 

Posttest A ....... 158 

63. Summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit IV, Pretest A - 

* 

Posttest B 160 

64. Summary of Mean Scores by Class for the Dependent and 

Independent Variables, Unit IV, Pretest B - 

Posttest A 162 

xiii 



o 



LIST OF FIGURES 

FIGURE PAGE 

1. The Alternate Form, Pretest-Posttest Design 5 



Chapter 1 



Background of the Study 



There is a large group of students in American schools that, for a 
variety of reasons, may be categorized as academically unsuccessful. 

Until recently, no concerted effort has been made to delineate the 
characteristics of this group and to prepare curricula! materials that 
provide academic successes while maintaining integrity of the content 
and developing its relevance for the student. In 1966, the Biological 
Sciences Curriculum Study commercially released a program in biological 
sciences for the academically unsuccessful entitled Biological Science ; 
Patterns and Processes . 

Based on feedback comments from teachers and students, the materials 
have been remarkably successful. However, no quantitative data exist to 
justify the claims of success for these materials and for the unique 
instructional procedures they entail. If this program is to serve as a 
model for curriculum development in other disciplines, critical and 
objective evaluations of the attainments of students taught with materials 
of this type are needed. 

The BSCS originally developed three parallel sets of course materials 
for high school biology: Biological Science : Molecules to Man (Blue 
Version) , High School Biology : BSCS Green Version , and Biological Science : 
An Inquiry Into Life (Yellow Version) . These materials were prepared by 
teams of writers working at Summer Writing Conferences during three 
successive years — 1960, 1961, and 1962. In the years following each of 
the first two summers' work, the materials were widely tested and reviewed 
to give feedback for rewriting. 

The BSCS became interested in pupils exhibiting poor achievement 
during the years the three BSCS Versions were being evaluated (1960-63). 

A Special Materials Committee was organized in 1962 to determine the 
characteristics of these students and to make recommendations a3 to how 
they might best be taught. The Committee examined and analyzed the 
literature on deprived youngsters, school dropouts, and students with 
learning problems. They interviewed teachers of the academically 
unsuccessful student and observed them with their classes. Data collected 
during the evaluation of the Versions were examined for criteria that 
could be used to predict student success. 

After an exhaustive study of all these data and materials, the 
Committee prepared a. plan for the development of materials in biology 
that could be expected to be more suitable for these academically 
unsuccessful students. The plan included: 

(a) writing the materials at a reading level in keeping with 
the students ' abilities , while keeping formal reading 
assignments at a minimum — to produce essentially an 
"unbook. " 



1 



(b) providing interesting activities within a framework of 
multisensory perceptions in the classroom situation. 

(c) constructing materials so that they would lead, in small 
steps, from one fact to another, eventually to a 
generalization and then to a new concept. 

(d) centering the learning situations around laboratory 
activities as much as possible in order to capitalize upon 
the potential interests and abilities of these students. 

(e) structuring the development of selected concepts common to 
all three of the BSCS Versions, but in a manner especially 
suitable to the characteristics of these students. 

(f) developing activities and procedures that served to 
demonstrate the role of inquiry in the accumulations of 
knowledge upon which the theories of modern biology are 
based. 

With these guidelines, BSCS writing teams, composed of high school 
teachers, college biologists, educational psychologists and science 
educators, proceeded to develop experimental materials that were used and 
evaluated in classroom situations by a total of 300 teachers and 15,000 
students. Several revisions were made prior to commercial publication. 
Teachers involved in the 1964-65 evaluation of these materials provided 
feedback used in the summer of 1965 to guide the final revision which is 
published commercially by Holt, Rinehart and Winston, Inc., under the 
title Biological Science : Patterns and Processes . 

Curriculum development projects typically rely upon very indirect 
data to evaluate instructional materials and teaching procedures . The 
prime data are the opinions of teachers. Usually the teacher is asked to 
evaluate rather large units of material and whole systems of concepts with 
a brief comment and a rating or two. It is not clear what criteria and 
what standards of excellence are being applied when a teacher judges a 
lesson to be successful or unsuccessful. Student interest may not be 
clearly delineated from student learning. The impressions of a teacher, 
as a participant-observer, may be unduly colored by the performance of a 
few students. Finally, even when there is consensus that a lesson was 
unsuccessful, teacher evaluations will not always be helpful in 
identifying the gaps in student understanding that must be filled to make 
the lesson a success. To be sure, teacher judgments do provide important 
information, but they cannot reasonably bear the whole burden of 
identifying the strengths and weaknesses of an instructional program, 
especially if there are attractive alternates. One such alternative is 
the substance of this study. 

Of the several things that should be taken into consideration when 
curricula are evaluated, student learning is among the most important. 

It seems obvious that a direct measure of student performance will be a 
better indicator of learning than teacher impressions. Indeed, there is 






2 



evidence that designing lessons on the basis of student performance data 
can result in substantially improved instruction. 

The BSCS, because of its participation in the effort to design 
instructional materials for this special purpose and population, is in a 
uniquely advantageous position to test the result, and, hopefully, to 
suggest to educators as a whole the criteria upon which success or failure 

in this effort may rest. 

This study, therefore, was directed toward the application of more 
effective evaluative techniques to assist in improving instruction for a 
significant fraction of the school population that has been consistently 
neglected. A major purpose of this study was to obtain reliable data on 
the effectiveness of the current materials in order to determine which 
procedures most improve the impact of these materials on problems of 
teaching the academically unsuccessful student significant ideas of 
modern biology. Subsidiary goals are to demonstrate the effectiveness 
of the overall design of these materials and to develop tests that could 
eventually be used by teachers for classroom evaluation of the 
academically unsuccessful student. Successful completion of this project 
may well provide a precedent and example which can be followed by other 
projects and for other materials to advance the cause of improved 
instruction for all students of the sciences. 

One reason for giving achievement tests is to evaluate students? that 
is, to rank them, to assign grades, or predict those likely to do well in 
college. When this is the purpose, it is appropriate to use the classical 
psychometric model. According to this model the likelihood of reliable 
discriminations between students is maximized when (1) the correlation of 
each item score with the total test score is maximized, and (2) the 
difficulty level of the items is as close to 50 percent as possible. When 
the full-dress treatment is given to the development of an achievement 
test, a large pool of items is tried out with a sample from the population 
for which the test is intended. Items that perform best in terms of the 
two criteria listed above are included in the final version of the test. 

A second reason for giving achievement tests is to evaluate the 
quality of instruction. Two somewhat different purposes may be 
distinguished. The first is "summative evaluation", so called because the 
purpose is to give a final test to a total instructional package, perhaps 
comparing it to competing programs, in order to provide potential 
consumers with information upon which to make a use decision. The second 
is "formative evaluation", wherein the purpose is to provide information 
to authors or teachers to help them improve the instructional program. 

Of these two goals, this study was concerned primarily with formative 

evaluation. 



l.C. Anderson, "Educational Psychology." Annual Review of 
Psychology 18 (1967) : 129-164. 

3 



o 



It is only in the last few years that it has become clear to 
educational researchers that the classical psychometric model is 
inappropriate for either summative or formative evaluation „ ' ' The 
procedures for selecting items dictated by the model cause the evaluator 
to discard items that most students answer correctly. Consequently, 
information about which concepts were well learned is lost . More serious , 
however, is the fact that items on which everyone does poorly are 
eliminated and, therefore, information about the weak points in the 
instructional program is systematically destroyed. The better the 
instruction that precedes the test the more likely the test is to contain 
tricky, hairsplitting questions on the footnotes rather than the main 
themes of instruction. This state of affairs follows from the logic of 
the model implying that the difficulty level of a test should be 50 
percent no matter how much and how well students have learned. Finally, 
the criterion that individual items should correlate highly with the 
total score biases the selection of items in the direction of those which 
measure relatively enduring student raits like verbal ability. At the 
same time, this criterion probably involves a bias against selecting 
items that are sensitive to immediate situational factors; for instance, 
whether the student has been subject to good or poor teaching. 

The objections to the classical psychometric model have been 
detailed here because the major course content improvement projects, 
including the Biological Sciences Curriculum Study, have uniformly 
developed achievement tests that are psychometrically "good." 

This study attempted to employ "criterion-referenced" tests. The 
sole basis for selecting a test item was whether the student's answers to 
the item would indicate the extent to which he understood an important 
concept (or could apply a problem-solving skill , use an experimental 
technique, etc.). There was no attempt to regulate the difficulty of 
items in advance of the research. The whole point of the research was to 
determine easy items (student learned and he now understands) and 
difficult items (the student did not learn and he did not understand) . 



2 . R. Glaser , "Instructional Technology and the Measurement of 
Learning Outcomes: Some Questions, " American Psychologist 18 (1963) : 
519-521. 

3. Richard C. Cox. and Julie S. Vargas, A Comparison of Item Selecting 
Techniques for Norm— ref erenced and Criterion— ref ere nced Tests , 
(Pittsburgh, Pennsylvania: Learning Research and Development Center, 
University of Pittsburgh, February, 1966). 

4. Ralph W. Tyler, Robert M. Gagne, and Michael Scriven, Perspectives 
of Curriculum Eval uation , American Educational Research Assn. 
Monograph Series, (Washington, D.C.: Rand McNally & Co., 1967 ). 

4 



o 



wmmmm. 



Chapter 2 

Research Design and Analysis 



Research Design 

The course materials , Biological Science : Patterns and Processes , 
were divided into five areas of study: ecological relationships, cell 
energy processes, reproduction and development, genetic continuity, 
and organic evolution. Each area of study was analyzed for significant 
concepts that served as guides for developing test items. Items were 
sorted into two test forms, A and B, of equal length, with at least one 
item for each concept. In this way alternate forms for each of five unit 
tests were developed. Randomly selected students within each class were 
administered these alternate forms for each unit test as shown in Figure 1. 



Classroom 


Pretest 


Instruction 


Posttest 


Subgroups 


Form 




Form 


Subgroup 1 


A 


X 


B 


Subgroup 2 


B 


X 


A 



Figure 1. The Alternate Form, Pretest-Posttest Design 



This alternate form, pretest-posttest design, eliminated the 
facilitation of posttest performance of a single form design and had the 
additional advantage that data on twice as many item;, were obtained with 
the same investment of student time. The number of items per student is 
an important consideration when the purpose is to discriminate among 
students, but when the goal is to discriminate between the well-learned 
and not well-- learned concepts, the number of items becomes paramount. 

A control group was not required in that the purpose of the study was 
to identify the effects of instruction with particular materials so that 
the revision of materials and suggestions for teacher adaptation of 
materials could be accomplished. 

The five pairs of alternate form multiple -choice tests served as the 
dependent variables in the study. The Verbal Reasoning (VR) and Numerical 
Ability (NA) sub-tests of the Differential Aptitude Test (DAT) , Form A, 
and the Davis Reading Test (DRT) , Comprehension, and Speed Tests served 
as independent variables. In addition to the test data on the student 
population, community characteristics and school district size were 
secured through a teacher questionnaire and from published statistical 
data. 



l.See Chapter 4-7. 

2. Gerald Kahn and Warren Hughes, "Statistics of Local Public School 
Systems, 1967. Fall 1967: Pupils Schools/Staff. 1966-67: 
Expenditures." National Center for Educational Statistics, Government 
Printing Office, Washington, D.C.: Superintendent of Documents, 

March, 1969. 



5 



Analysis 



The data were subjected to statistical analysis on the CDC 6400 
computer at the University of Jolorado, Bouldor. 

The initial analysis was run early in June to provide data for the 
writing team to use in improving the Revised Edition of Patterns and 
Processes . Output from the initial run included the percent correct on 
the pretest and posttest, and the percent possible gain^ for the groups 
of items comprising each of the concepts on each test administered. 

Later, a complete item analysis was run on each test using the 
FORTAP (Fortran Test Analysis Package) program developed by Baker and 
Martin^ and modified by personnel of the Laboratory of Educational 
Research, Univeristy of Colorado, Boulder. Data obtained with this 
program included mean, standard deviation, standard error, and a Hoyt 
Reliability estimate for each test. In addition, difficulty (% correct) , 

R biserial, X 50-values and /-values were printed out for every response 
on every test item. 

The results of the initial FORTAP analysis were carefully scrutinized, 
and some items were eliminated on the basis of logical, factual, or 
structual errors in the item itself. Before any subsequent analysis was 
conducted, each correct response was given a weight of 4 to compensate for 
guessing and all tests, with "bad items" deleted, were rerun on the FORTAP 
program to yield more accurate reliability estimates. Punched output, 
including weighted (X4) scores for each item and final score, was obtained 
for each student. The cards with weighted scores were matched with cards 
containing DAT and DRT data. Only those students for whom complete data 
(pretest, posttest, DAT, and DRT) were available were used for the 
subsequent analysis. 

A factor analysis was run on each test to determine whether or not 
the items grouped in each concept were loading on similar factors 



Bmdujm lieneral Factor Analysis Program. The BMD03M performs a principal 
component solution and an orthogonal rotation of the factor matrix. 
Communalities were estimated from the squared multiple correlation 
coefficients (r ) . Output from the 03M included the mean and standard 
deviation of each variable, correlation matrix, Eigen-values including 
cumulative proportions of total variance , Eigenvectors , and a factor 
matrix. The Harris-Kaiser factor analysis program performed an oblique 



3. See Appendix C. 

4. F. B. Baker and T. J. Martin. FORTAP : A Fortran Test Analysis 
Package , Laboratory of Experimental Design, Wisconsin Research and 
Development Center for Cognitive Learning. The University of 
Wisconsin, March 1, 1968. 

5. W. J. Dixon, ed. , BMP Biomedical Computer Programs , (Berkeley: 
Univeristy of California Press , 1968) , pp . 169-184 . 




by computer programs. The raw data were processed by the 



6 



